Nonparametric Density Estimation and Clustering with Application to Cosmology

نویسنده

  • Woncheol Jang
چکیده

We present a clustering method based on nonparametric density estimation. We use Kernel smoothing and orthogonal series estimators to estimate the density f and then we extract the connected components of the level set using a modified Cuevas et al (2000) algorithm. We extend an idea due to Stein (1981) and Beran and Dümbgen (1998) to construct confidence sets for the level set {f > δc} using the asymptotic distribution of loss function. Specifically, we show the stochastic convergence of the pivot process, Bn(λp) = √ n(Lp(λp) − Ŝp(λp)) where Lp(λp) and Sp(λp) are the loss function and the estimated risk function with the smoothing parameter λp. Inverting the pivot provides a confidence set for the coefficient of the orthogonal series estimator and furthermore one can construct a confidence set for functionals of f . We consider applications in astronomy and other fields. Acknowledgment This is joint work with Larry Wasserman, Chris Genovese and Bob Nichol. References[1] Beran, R. and Dümbgen. (1998). Modulation of Estimators and Confidence Sets. Ann.Statist.,26, 1826-1856.[2] Cuevas, A., Febrero, M. and Fraiman, R. (2000). Estimation the number of clusters.The Canadian Journal of Statistics, 28, 367-382.[3] Jang, W. and Wasserman, L. (2003). Confidence Sets for Densities and Clusters. Inpreparation.[4] Stein, C (1981). Estimation of the mean of a multivariate normal distribution. Ann.Statist.,9, 1135-1151.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Clustering Algorithm with Application to Cosmology

We present a fast clustering algorithm for density contour clusters (Hartigan , 1975) that is a modified version of the Cuevas, Febrero and Fraiman (2000) algorithm. By Hartigan’s definition, clusters are the connected components of a level set Sc ≡ {f > c} where f is the probability density function. We use kernel density estimators and orthogonal series estimators to estimate f and modify the...

متن کامل

Statistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm

This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...

متن کامل

Bayesian Density Regression and Predictor-dependent Clustering

JU-HYUN PARK: Bayesian Density Regression and Predictor-Dependent Clustering. (Under the direction of Dr. David Dunson.) Mixture models are widely used in many application areas, with finite mixtures of Gaussian distributions applied routinely in clustering and density estimation. With the increasing need for a flexible model for predictor-dependent clustering and conditional density estimation...

متن کامل

Fast Estimation of Nonparametric Kernel Density Through PDDP, and its Application in Texture Synthesis

In thiswork, anewalgorithm isproposed for fast estimationofnonparametricmultivariate kernel density, based on principal direction divisive partitioning (PDDP) of the data space.The goal of the proposed algorithm is to use the finite support property of kernels for fast estimation of density. Compared to earlier approaches, this work explains the need of using boundaries (for partitioning the sp...

متن کامل

On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification

Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class class...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003